Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice

نویسندگان

Takashi Nose

Taiki Kamei

چکیده

To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize naturalsounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio

Today, we can record and store large amounts of single speaker audio data, and also download it from the web. Generally, these data are prosodically rich and can therefore act as excellent candidates for building concatenative text-to-speech (TTS) systems. But transcritpions for these audio data are often not available and automatic transcriptions are error prone. In addition, these audio data ...

متن کامل

Automatic Construction of a Prosodically Rich Text Corpus for Speech Synthesis Systems

This paper presents a method for an automatic compilation of a phonologically rich text database, which is used in a concatenative text-to-speech (TTS) synthesis system. In this method, linguistic features are predicted from text using Festival’s linguistic engine. A set of phonological units for a specific text is compiled from attribute value lists (AVLs). Phrases/sentences that contain the p...

متن کامل

On building phonetically and prosodically rich speech corpus for text-to-speech synthesis

This paper proposes a way of preparing and recording a speech corpus for unit selection text-to-speech speech synthesis driven by symbolic prosody. The research is focused on a phonetically and prosodically rich sentence selection algorithm. Symbolic description on a deep prosody level is used to enrich the phonetic representation of sentences (by respecting the prosodeme types phones appear in...

متن کامل

Command Speech Interface to Virtual Reality Applications

During last five years several attempts to develop the speech interface to especially simulation applications emerged due to the recent improvements in speech and language technology and the complexity of those application’s interfaces. We describe our approach to control Virtual Reality applications via voice and GUI, in creation of simple multimodal command speech interface based on dialog mo...

متن کامل

Multi-modal Mathematics: Conveying Math Using Synthetic Speech and Speech Recognition

Over the past decade, the notion of multi-modal access to technology has moved from the realms of science fiction to reality. It is not now unthinkable to communicate with a machine using voice recognition software, and to have the computers’ response spoken in a voice comparable in quality to a human. This paper outlines methodologies for the verbal presentation of mathematical material to a u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice

نویسندگان

چکیده

منابع مشابه

Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio

Automatic Construction of a Prosodically Rich Text Corpus for Speech Synthesis Systems

On building phonetically and prosodically rich speech corpus for text-to-speech synthesis

Command Speech Interface to Virtual Reality Applications

Multi-modal Mathematics: Conveying Math Using Synthetic Speech and Speech Recognition

عنوان ژورنال:

اشتراک گذاری